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1. Introduction 

A main issue in data mining is the feature extraction of a large set of curves. In- 
deed, classification methods enable to split the data into different homogeneous 
groups, each representing a specific mass behavior. But, within one group, the 
observations differ slightly the one from another. Such variations take into ac- 
count the variability of the individuals inside one group. More precisely, there is 
a mean pattern such that, each observation curve is warped from this archetype 
by a warping function, see for examples [TB] . 

In this work, we focus on the particular case where the individuals usually 
experience similar events, which are explained by a common pattern, but the 
starting time of the event occurs sooner or later. Classification methods, like 
repeated measures ANOVA or Principal Components Analysis of curves, see 
for instance |17j . ignore this type of variability. Hence, computing a represen- 
tative curve, for each group, severely distorts the analysis of the data. Indeed, 
the average curve (usually the mean or the median) oversmooths the studied 
phenomenon, and is not a good description of reality. 
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In our work, we restrict ourselves to the case where all the curves can be 
deduced the one from another by a shift parameter. Hence, we consider the 
following model: for j — 1, . . . , J and i = 1, . . . , rij, we observe 

Yij=f(tij-Oj)+<reij, (1) 

where, J stands for the number of curves Cj, while rij is the number of ob- 
servations for the j-th individual. Values Uj are observation times, which are 
assumed to be known. The unobserved warping effects 9*, j — 1, . . . , J, are shift 
parameters which translate the unknown function /. We also choose £y = i, 
and rij = n, which means that all curves are observed at the same time with the 
same occurrence. The errors £y for £ {1, . . . , n} x {1, . . . , J} are i.i.d. with 
distribution 7V(0, 1). Moreover, without loss of generality, we assume in the fol- 
lowing that (7 = 1 (see Remark l3.2[) . We aim at estimating the shift parameters 
6j,j = l,..., J, in order to find a good representative of the feature /. 

A more general problem has been tackled in the literature and some work has 
been done to find a representative of a large sample of close enough functions 
fj, j — 1, . . . , J, see for examples [18]. Indeed, in a general case, we observe 
realizations t/y, j = 1, . . . , J, i = 1, . . . , rij, from model 

Yij=fi ( 2 ) 

where, ey, j = 1, . . . , J, i = 1, . . . , rij, are i.i.d. random variables, representing 
the observation noise. Hence, such functions fj, j — 1,...,J, are close from 
each other in the sense that there exists an unknown archetype / and unknown 
warping functions hj, j = 1, . . . , J, such that, for all j = 1, . .., J, 

vte[o,T], fj(t) = fohj(t). 

Examples of such data might be growth curves, longitudinal data in medicine, 
speech signals, traffic data or expenditure curves for some goods in the econo- 
metric domain. Our main motivation in this paper is the analysis of the vehicle 
speed evolution on a motorway. The data are curves, describing the evolution, 
on observation cells, of the daily vehicle speed. After performing classification 
procedures (see for instance [14] for a complete study), we obtain clusters of 
functions, each one representing a typical common behavior. Indeed, all the 
curves can be deduced one from another by a shift parameter. 

This kind of issue led several statisticians to apply transformations to func- 
tions in order to get rid of the shifts and to align the curves. If a parametric 
model would be available a priori, the analysis would be made easier. But, if 
the data are numerous, there is not generally enough knowledge to build such 
a model. Thus, they turn into a non parametric framework. When the pattern 
is known, the problem turns to align a noisy observation with a fixed feature. 
Piccioni, Scarlatti and Trouve in [15] . Kneip, Li, MacGibbon and Ramsay in 
|llj . or Ramsay and Li in [16j proposed curve registration methods. Their main 
idea is to align each curve on a target curve /o, which means finding, for all 
j G {1, . . . , J}, the warping function hj minimizing 

F x (f ,fj;hj)= f ||/ i o/ b .(*)-/ (i)|| 2 dt + A / w 2 j(t)dt, 
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where hj belongs to a particular smooth monotone family defined by the solu- 
tion of the differential equation D 2 hj = WjDhj. Hence, Wj is simply D 2 hj/Dhj, 
the relative curvature of hj. Thus, penalizing Wj yields both smoothness and 
monotonicity of hj (see [16] for more details). The main drawback of such meth- 
ods is that they assume that the archetype /o is known, which is a reasonable 
assumption in pattern recognition, but which is unrealistic when the observed 
phenomenon is not well known as in our study. Alternatively, in a non paramet- 
ric point of view, the pattern is replaced by its estimate. In this case, the issue 
is a matter of synchronizing sample curves. Wang and Gasser in [21j use kernel 
estimators. In another work, Gasser and Kneip, in 5 , align the curves by align- 
ing the local extrema of the functions, which are estimated as zeroes of the non 
parametric estimate of the derivative. In all cases, the issue of estimating the 
shifts is blurred by the estimation of the curves, which leads to non parametric 
rates of convergence. 

Hence, it seems natural to study our regression problem JT]) in a semi-parametric 
framework: the shifts are the parameters of interest to be estimated, while the 
pattern stands for an unknown nuisance functional parameter. A very general 
semi-parametric regression model called Self-modelling regression (SEMOR) has 
been considered in [10]. The model is fj(-) — f (■,()*), j e {1,..., J}, and a 
general backfitting algorithm is studied. Roughly speaking, after initializing an 
estimate of / by a first guess (using for example a kernel method), this algo- 
rithm is based on two recursive steps. In the first step, the estimation of 8*, 
j = 1, . . . , J, is performed. In the second step, the estimate of / is updated. 
In both steps, estimations are performed using a least squares criterion. In |10j 
a complete study, including the asymptotic normality of the estimates, is per- 
formed for the Shape-invariant model ( SIM) introduced in [12] . See also [13] , [8] 
and [9] for related works. Actually, the model studied in our paper (regression 
model ([1])) falls in the SIM frame, so that, the methods studied in [10] may be 
applied. Nevertheless, the estimation procedure developed here is new, struc- 
turally simpler and computationaly easier to implement than the complicated 
backfitting algorithms. 

The difficulty of the work is that the estimation must not rely on the pattern, 
even if the quantities are deeply linked. That is the reason why we will use 
an M-estimator built on the Fourier series of the data. Under identifiability 
assumptions, we provide a consistent method f Theorem 12. 1[) to estimate at the 
parametric rate of convergence the shifts 9*, j — 1, . . . , J, when / is unknown, 
and we show that fluctuations of the estimates are asymptotically Gaussian 
(Theorem 13. 1|) . Further, our estimation method leads to a fast algorithm to 
align shifted curves without any prior assumption on the feature, due to semi- 
parametric techniques. We point out that this study can be linked first with 
the study of Golubev in [7] , dealing with the semi-parametric efficiency in the 
estimation of shifts in a continuous observation scheme, and also with the study 
of Gassiat and Levy-Leduc in [6] , dealing with the estimation of the periodicity 
of a signal. Further, the mixed effects model ([1} with random shifts is studied 
in pQ (see also [2]). We outline the fact that the method we propose handles 
a large variety of curves with minimal smoothness properties, namely we only 
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require L 1 conditions. 

The present paper falls into six parts. Section [5] is devoted to the definition 
of the model and to the description of the estimation method. In Section [3l we 
provide asymptotic properties of the estimators. As a matter of fact, we show 
that the estimators are convergent and asymptotically Gaussian. The estimating 
method is effectively performed in Section 21 on some simulated data, and then 
used to analyze road traffic data. We compare our results to another existing 
method. The technical lemmas and the proofs are gathered in Section [5] and 
Section [6] 

2. Semi-parametric estimation of shifts 
2.1. Model 

For the j-th curve (j — 1, . . . , J), we get n observations yij, i = 1, . . . , n, mea- 
sured at equispaced times ti = ^-T £ [0,T[, with T e R^. We model these 
observations in the following way: 

Ya = f(U — 9*j) + £ij, j = I,. . . ,J, i = 1, ... ,n, (3) 

where, / : R — > R is an unknown T-periodic function, 9* = (6\, ... , 6>* 7 ) e R' 7 
is an unknown shift parameter, 9* is the shift of the j-th curve, and, Ey, i — 
1, . . . , n, is a Gaussian white noise, with variance 1. For sake of simplicity, we 
consider an unitary variance, but all our results are still valid for a general 
variance. 

Our aim is to estimate the translation factors (9*) without the knowledge 
of the pattern /. Due to the special structure of the model, Fourier analysis 
is well suited to conduct such a study, since the Fourier basis diagonalizes any 
translation. Then, using a Discrete Fourier Transform we may transform the 
model ([3]) into the following one (supposing n is odd): 

dji = e~ lla hi(f) + w jh j = 1, . . . , J, I = -(n - l)/2, . . . , (n - l)/2, (4) 

where, q(/) = ± Yl=x f(t m )e- a ^ , I = -(n - l)/2, . . . , (n - l)/2, are the 
discrete Fourier coefficients and a* = ^f-9* € R, j = 1, . . . , J, are the phase 
factors, and, for all j G {1, . . . , J}, wji, I = — (n — l)/2, . . ., (n — l)/2, is a 
complex Gaussian white noise, with complex variance 1/n, and with independent 
real and imaginary parts. As previously, our goal is to estimate the phase factors 
a* , j = 1 , . . . , J, without the knowledge of the Fourier coefficients of function 
/. Stricto sensu, the discrete Fourier coefficients are not the Fourier coefficients 
of the functions, but the bias induced is similar to the bias induced by any 
discretization in regression, which vanishes under some regularity assumptions, 
as shown in 0]. Hence, from now, we will consider the model (fj| with q(/) = 
T Jo f(t) e ~ l2 ^^dt. Observe that in this last equation we only have to assume 
that / is integrable. Hence, if we only consider the discretized version given 
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in Model 2.2, only a minimal smoothness conditions (/ G L 1 (R)) is necessary, 
contrary to other methods in statistics, which require stronger conditions. 

We point out that we are facing a semi-parametric model. As a matter of 
fact, we aim at estimating the parameter a* — (a*, • • • , a}) which depends on 
an unknown nuisance functional parameter (c;(/)) ;gZ , the Fourier coefficients 
of the unknown function /. 



2. 2. Identifiability 

We notice that the model |4| is not identifiable for all translation parameters. 
Indeed, replacing a* by 



Oil 



1 



1 



■2tt 



c e 



( k x 



ez j , (5) 



and replacing /(•) by /(• — c), let invariant the equation (j4])- So, in order to 
ensure identifiability of the model, we restrict the parameter space A: 



i) A is compact, 

ii) a* e A, 

iii) if a £ A and {5} holds for a, then a — a* 



(6) 



In this paper, we will mainly consider, in the Theorem 13.11 the parameter set 
Ax = {a £ [— 7r, 7r[ J : ax = 0}. Hence, in ((5]) the constant c must be equal to 0. 
Our fluctuations theorem can easily be transposed to other choices of parame- 
ter spaces, for example A 2 = |a £ [—n, tt[ j : J2j=x a i = an d «i € [0, 2jS [|. In 

this last case, the condition X)/=i Q i = implies in |J5| that c = — ^2 ^ . =1 fej-. 
So that, with equation (0, we can write that 




/ (J-l) -1 



2tt 

T 



-l 



-1 \ 



-1 

-1 (J-l) / 



/ kx 

\ kj 



Hence, we get J different solutions in [—%, ir[ J C WL J , and a unique solution with 
the additional condition ax £ [0, [. 



2. 3. Estimation 

Since we want to estimate the shifts without prior knowledge of the function 
/, we will consider a semi-parametric method, relying on an M-estimation pro- 
cedure. Hence, the functional parameter is a nuisance parameter that does not 
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play a role in the rate of converge of the estimates of the parameters, regardless 
of smoothness conditions for /. 

For this, define, for any a — (a\, . . . , aj) <G A, the rephased coefficients 

~ Cj i{a) = e aa 'd jU j = 1, . . . , J, I = -(n - l)/2, . . . , (n - l)/2, 

and the mean of these rephased Fourier coefficients 

1 J 

W = 7 E 5 ^ a )' i = -(« - A ■ ■ ■ , (n - l)/2. 

J 3 = 1 

We have that Cji(a*) = ci(f) + e lla *iwji, for all j e {1, . . . , J}, and 



Hence, |cj/(a) — c/(a)| 2 should be small when a is close to a* 
Now, consider a bounded measure /i on [0, T] and set 



Obviously, the sequence (Si) is bounded. Without loss of generality we will 
assume that So — 0. Assume further that ^ ( |#z| 2 |q(/)| 2 < +00. So that / * \i 
is a well defined square integrable function: 

/ * t*(x) = J f(x - y)dfj,(y). 
Consider the following empirical contrast function: 

m„m = 7 E E l*l 2 l^(«)-ai(«)l 2 - (7) 

3=1 l=-s^± 

In the sequel, we will always assume that: 

The set {/ : <5;q 7^ 0} contains at least two integers which are coprime. (8) 

The random function M n is non negative. Furthermore, its minimum value 
should be reached close to the true parameter a*. Then, the following theorem 
provides the consistency of the M-estimator, defined by 

a n = argminM„(a). 



F. Gamboa et al. / Semi- parametric estimation of shifts 



622 



Theorem 2.1 Under the following assumptions on f and on the weight se- 
quence (Si) leZ : 

C£n 2 m/)i 2 < +oo < 

$>| 4 |q(/)| 2 <+oo 
p o , 

we have that a n - — > a* . 

n — >+oo 

We point out that we only assume that and /*/Lt*/Lt are square integrable and 
yet are able to build estimates of the shifts in the model (j4]) . The computation 
of the estimator is quick since only a Fast Fourier Transform algorithm and a 
minimization algorithm of a quadratic functional are needed. 

Proof 2.2 (Proof of Theorem 12. ip The proof of this theorem follows the clas- 
sical guidelines of the convergence of M- estimators (see for example \19$). In- 
deed, the contrast is split into two parts, a determinist and a random one. Then, 
it suffices to show that the following conditions hold for the criterion function 
to ensure consistency of a n . 

i) Convergence to a contrast function: 

Mja) P "* ) K{a), a £ A, (10) 

n — *-foo 

where K(-) has a unique minimum at a* . 
ii) Set the modulus of uniformly continuity W , defined by 

W{n,n)= sup \M n (a)-M n (J3)\. 

\\a-P\\<V 

There exists two sequences {r]k) kGN and (efc) fegN , decreasing to zero, such 
that for a large enough k, we have 

lim P Q . (W(n,r) k ) > e k ) = 0. (11) 

n — >+oo 

These two conditions are fulfilled, as it is proved in Section Notice that we 
chose to privilege the uniform convergence of the modulus of continuity of the 
contrast and not the uniform convergence of the criterion itself. Nevertheless, the 
two proofs use the same kind of arguments, i. e proving the uniform convergence 
of empirical processes of Gaussian variables. 



3. Asymptotic normality 

In this section, we prove that the estimator built in the previous section is 
asymptotically Gaussian, and we give its asymptotic covariance matrix. In gen- 
eral, the asymptotic covariance matrix hardly depends on the geometric struc- 
ture of the parameter space A. So, for sake of simplicity, we study the asymptotic 
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normality for the parameter space A\. Hence, the parameter space has dimen- 
sion J— 1, and we rewrite this set as A\ = [— tt, 7r[ J_1 and, any element in A\ as 
a = ( a 2, ■ • ■ , aj). Also, for sake of simplicity, in this section and in the proofs of 
Theorem l3.1[ we will write M n (a) instead of M„(0, a>2, ■ ■ ■ , aj). So, we consider 
any estimator defined by 

a„ = arg min M n (a). 



Theorem 3.1 Under the following assumptions {Si) l&1 
'0<^|<5 i | 2 ; 2 | Q (/)| 2 <oo 

ZGZ 



< oo 



(12) 



we get that 



with 



E n 4 ' 4 

— — n 

\/n(a n - a*) 



v 



n — *+oo 
4;2|„ / r\|2 



£ ;ez i^mQ(/)i 



o(n 2 ), 



X/-i(o,r), 



+ Uj-x) 



(13) 



where, Ij—i is the identity matrix of dimension J — 1, and Uj—% is the square 
matrix of dimension J — 1 whose all entries are equal to one. 

Remark 3.2 // the white noise in the model ([3]) has a variance equal to a , 
then the limit distribution in the previous theorem has a covariance matrix equal 
to CT 2 r. 

Proof 3.3 (Proof of Theorem [37T|) Recall that the M -estimator is defined 
as the minimum of the criterion function M n (a). Hence, we get 

VM„(a n ) = 0, 

where V is the gradient operator. A second order decomposition leads to: there 
exists a n in a neighborhood of a* such that 



[V 2 Af„(a„)] 1 VnVM w (a*) 



(14) 



where V 2 is the Hessian operator. Now, using the two asymptotic results from 
Proposition I5.il and from Proposition \5.HH we get 



Sh~VM n (a*) ° > A/>-i(0,r ), 

n — >-\-oc 



[V 2 M„(a„)] 1 



v, 
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where V is a non negative symmetric matrix of dimension J — 1 . Hence, if we 
set r = V'TqV, the result of Theorem \3.1\ follows easily. Finally, we see that 
EL-n N 4 ' 4 = °(« 2 ) implies ifcai£?=_ n |<5,| 4 P = o{n). Indeed, £ £™ = _„ l^l 4 * 4 > 
n 2 \S n \ A + n 2 \S. n \\ so lim| nH+00 \5 n \ 2 \n\ - 0. Hence ± E^-n N 4 ' 2 = o(l). 
Moreover £ ; |(5;| 2 Z 4 |q(/)| 2 < +oo implies Assumption (|19[) . So £/iai 7 i/ie set of 
assumptions (|12p implies the ones of both Provositions \5.^ and \5.1\ 

Observe that the extra terms (<$z); eZ used in the definition |(7J) smooth the cri- 
terion function M n (a). Indeed, without this term, i.e under the choice Si = 1, 
the random part of the derivative of the criterion function does not converge 
towards a determinist function but to a random process, which prevents the 
study of the asymptotic distribution. The weights enable to get rid of this part, 
smoothing the contrast to zero. 

Moreover, the convergence of the criterion is speeded up by using these 
smoothing weights. We illustrate this purpose on Section 2] by comparing a 
weighted criterion with a non weighted one (see Figure [3]). Moreover, this result 
will be highlighted in the proof of Theorem 13.11 

Practical choice of the S^s 

The problem of choosing the weights (<5;) ;eZ in the definition of the criterion 
function is important. If we work with L 2 functions, the assumption (|12jl is 
satisfied for example as soon as \Si\ — O (|^|~ 2 ~") , for some v > 0. In the sim- 
ulations our functions are much more regular. Hence, we have taken Si = l/l/l 1 ' 3 . 
This choice guarantees consistency and good numerical results. Moreover, in or- 
der to illustrate the importance of the weight sequence, we have also taken 
Si = 1 and Si = l/\l\ 2 in Figure [3l Indeed, when looking at the asymptotic vari- 
ance, we can see that there is a trade-off which leads to a lower bound for the 
smoothing sequence, the smaller the weights, the larger the variance. Since the 
function / is unknown and so the sequence does not depend on the Fourier coef- 
ficients, hence the optimal choice for (5;) igZ should be given by semi-parametric 
efficiency. Using Cauchy-Schwarz's inequality, we get that 

{Z, eI \W*MfW'f ~ \k J 

This case, corresponding to the least favorable case in the semi-parametric effi- 
ciency framework, is obtained for the optimal choice of coefficients Si = 1. If an 
asymptotic fluctuation results would hold for this sequence, we would obtain: 




V 2 2 |c ; (/)| 2 (a„ - a*) Mj-i (0,/j_i + Uj-i) . 



Nevertheless, for the choice of the weight sequence Si = 1 the asymptotic nor- 
mality result does not hold. Non optimality as regards asymptotic efficiency 
is the price to pay both to deal with a discretized version of the regression 
model and to handle simultaneous estimation for all the unknown functions. 
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Maybe, a different way of estimation could get rid of this drawback. Yet, an- 
other choice could have been done to smooth the contrast by restricting the 
number of Fourier coefficients, as it is done in [7] for example. Some links could 
also be established between the estimator we consider and a Bayesian penalized 
maximum likelihood estimator, where the weights (<5;);ez stand for a particular 
choice of a prior over the unknown function /. This Bayesian point of view is 
tackled in L 3J. However, the optimal choice of the smoothing parameter to ob- 
tain efficiency is a very interesting issue in the semi-parametric framework (i.e 
when the weights are not allowed to depend on the Fourier coefficients of the 
functions). Quite posterior to the first submission of this work, this problem has 
been solved in a [20] . 

Remark 3.4 Throughout all the work, we assume that the observation noise 
in the model ([3]) is Gaussian. Nevertheless, we could get rid of this assumption 
with moment conditions on the errors. 

4. Applications and simulations 

In this section, we present some numerical applications of the method. The first 
one gives results on simulated data. The second one is based on an experiment 
on human fingers force. The last one is carried out with traffic data. 

The optimization algorithm used in any resolution is based on a Krylov 
method (the conjugate gradient method). Indeed, minimizing an L 2 criterion 
function with a conjugate gradient algorithm yields a reduced step number, and 
hence, a small complexity. 

Simulated data 

Simulated data are carried out as follows: 

Vij = f{U - Oj) + j = 1, • ■ • , J, i = 1, ■ • • ,n, 

with the following choice of parameters: J = 10; n = 100; values t{ = —n + 
^-2tt, i = 1, . . . ,n, are equally spaced points on [— w, 7r[; f(t) — 15sin(4t)/(4t); 
(#2, • • • , 0j) are simulated with a uniform law on [— 7r/4, 7t/4] and 9\ = 0; for all 
j G {1, . . . , J}, for all i € {1, . . . , n}, values are simulated from a Gaussian 
law with mean and standard deviation 1. Results are given on Figure [TJ The 
target function / is considered as a 27r-periodic function (T = 2tt), hence a* = 
9* . The function / is plotted by a solid line in Figure [1] (d) . Figure Q] (a) shows 
simulated data yij, j = 1, . . . , J, i = 1, . . . ,n. The cross-sectional mean curve 
of these data is given on Figure Q] (d) by the dotted line. We can see that this 
mean function is representativeness of data. Indeed, the amplitude of higher 
optimum is reduced, and smallest ones disappeared. Figure [1] (c) shows curves 
unshifted by the estimated parameters. The mean function of these unshiftcd 
curves is given on Figure [T](d) by dashed line. Figure [2(b) plots 9* on abscissa 

axis against 9j on ordinate axis, j = 1, . . . , J. Estimations are very close to 
true parameters. Comparison between mean curves, before and after the shift 
estimation, is straightforward. 
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Figure 1. Estimation results with the M -estimation methodology. 



We now compare our estimations with those obtained with an existing method: 
curve registration by landmarks. This method aims at aligning curves by, first, 
estimating landmarks of curves (here, the maximum) and by, secondly, align- 
ing these landmarks. For more details on this procedure, see [S]. In Figure [H 
we show the results on our simulated data. These results are not as good as 
those we obtain with our method. It can be explained by the fact that we need 
first to estimate each curve maximum by a non parametric method (a kernel 
estimation), which leads to estimation errors. Moreover, our method uses all 
information given by the data, not only that given by landmarks. 

In order to illustrate the importance of the weight sequence, we compare now 
the criterion function for various values of (c>z)zez- For this purpose, simulated 
data sets are carried out, with J = 2, 6\ =0 and Q\ — 7r/3. Figure [3] shows 
the obtained results. The first column of this figure presents these simulated 
data sets, with respectively, a = 1 in Figure [3] (a,l), a — 3 in Figure [3] (a, 2), 
a = 5 in Figure [3] (a, 3) and a = 7 in Figure [3] (a, 4). The second column presents 
the unweighted criterion functions, i.e with Si = 1, associated respectively with 
(a,l), (a, 2), (a, 3) and (a, 4). The third and fourth columns present the associated 
weighted criterion functions M„(-) with, respectively, Si = l/l?! 1 ' 3 and Si = 
l/|i| 2 . In all these figures, the vertical dashed line represents the value 7r/3 where 
the minimum value of our criterion function is achieved. It clearly appears that 
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Figure 2. Estimation results with the landmark methodology. 



without the weight sequence, i.e with Si = 1, the criterion function converges to a 
random process. Moreover, the variance of this random process is proportional to 
the noise variance a 2 . We also see that even with an important noise variance, as 
in Figure [3] (a, 4) , our weighted criterion functions, (c,4) and (d,4), are smooth, 
with a unique minimum around 7r/3. This shows that our procedure is quite 
robust to the SNR. Moreover, it appears that the impact of the exponent (3 > 
1.25 of the weight sequence Si — is only on the amplitude of the M- 

f unction. 

These numerical results emphasize the fact that the weight sequence (^i)zgz 
is important, but that its value can be easily chosen. 

Pinch force data 

Data presented here are extracted from an experiment described in [16j with 
a Curve Registration methodology. Data represent the force exerted by the 
thumb and forefinger on a force meter during 20 brief pinches. These 20 force 
measurements having arbitrary beginning, Ramsay and Li in |16j begin their 
study by a landmark alignment of curve maxima (with single shifts). These 
aligned data are shown in Figured] (a). 

Our purpose is to study these data with our shift estimation methodology. 
Shift estimations and unshifted curves are respectively shown in Figure|3](b) and 
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(a.1) (b,1) (o,1) (d,1) 




Figure 3. Criterion functions, with different values of the weight sequence (<5;) 



Figured] (c) . In Figured] (b), we only show a boxplot of the estimated parameters 
because, obviously, we do not know the real parameters. We note that shift 
parameters are almost all close to zero, between — 1CP 3 and 3 x 10~ 3 . In this 
case, landmark alignment unshift quite well the data. Nevertheless, comparing 
Figure[4] (a) and Figured] (c), we can see that curves are slightly better aligned 
after our shift estimation methodology. In Figure H](d), the cross-sectional mean 
curves of unshifted curves (solid line) and of primary curves (dotted line) are 
almost the same ones. 

Application to road traffic forecasting 

Most of the Parisian road traffic network is equipped with a traffic road mea- 
surement infrastructure. The main elements of this infrastructure are counting 
stations. These sensors arc situated approximately every 500 meters on main 
trunk roads (motorways and speedways principally). Every counting station 
measures, daily, the average speed of vehicle flow on 6 minutes periods. We 
consider measurements from 5 AM to 11 PM, hence, the length of the daily 
measurement is 180. We note the speed measurement of day j £ {1, . . . , J} 
and of period i £ {1, . . . , n}, with n = 180. 

Our purpose is to improve, with our shift estimation methodology, an existing 
forecasting methodology. This forecasting methodology is described in [14] . This 
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Figure 4. Shift estimation results on the pinch force data set. 



procedure is based on a classification method. We dispose of a sample of J speed 
curves and we want to summarize it by a small number N of standard profiles, 
representatives of each cluster. 

Consider several clusters of J speed curves. Indeed, we note frequently that 
many subgroups are composed by curves describing the same behavior. For 
example, we observe a speed curve subgroup with a same traffic jam or speed 
reduction, but with different starting times for each curve. Thus, Figure [5] (a) 
represent a particular cluster on a particular counting station. Figure [5] (b) is a 
boxplot of the estimated shifts 6j, j — 1, . . . , J. Unshifted curves are plotted on 
Figured] (c). So, in this homogeneous cluster, where only a shift phenomenon 
appear, the mean curves in Figure [5] (d) of unshifted curves (solid line) and 
of primary curves (dotted line) aren't the same. The shift estimated mean is 
clearly more representative of individual pattern. 



5. Technical Lemmas 



The two following propositions, Proposition 15 . II and Proposition l5.2i are used in 
the proof of asymptotic normality (Theorem 13. 1[) . Their proofs are postponed 
to the appendix. 




Figure 5. Shift estimation results on a particular traffic data set. 



Proposition 5.1 Assume that the Si 's are such that 

J2l 2 \Si\ 4 \a(f)\ 2 <+™ 

lei. 

lez 

Then 

Mm„K) — ^— JVj_i(o,r), 

n — >+oo 

where the variance matrix is V = -j? J2iez l^| 4 ^ 2 | c '(/)| 2 {Ij-i ~ jUj-i) . 
Proposition 5.2 Assume moreover that the sequence Si satisfies 

J2l 2 \Si\ 2 \ci(f)\ 2 < +™ 

lez 

£/ 4 |<5 ; | 4 |Q(/)| 2 <+^ 

lei, 

lei 
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Then, for any sequence (a n ) ngN such that \\a n — a*\\ < \\a n — a*\\ (n e J\f), we 
have 

V 2 Af„ (a n ) -^-^ A E l^| 2;2 | c '(/)| 2 ( J/ ./-i - Uj-i) ■ (21) 



6. Appendix 

Let z be a complex number and z its conjugate. We write *Ke(z) = h(z + z) (the 
real part of z) and 3m(z) = — z) (the imaginary part of z). 

Proof 6.1 (Proof of Theorem 12. X[) In the sequel we assume without loss of 
generality that all the a* are equal to 0. Consider the following notation: for all 

j = l,..., J, for all I = -(n-l)/2, . . . , (n-l)/2, w jt = = + i$) . 

-ffere, fCfiJ arl ^ (^) are independent Gaussian sequences, with law JV n (0, I n ). 
Also, set 

VI = -(n - l)/2, . . ., (n - l)/2, q(/) = \c 4 (f)\e w ' , with 9 l e [0, 2tt[. 

J 



7 Ei^(«)i 2 -iq(«)i 2 



Using the following decompositions 

\c 3 i(a) = |qC0| 2 + K| 2 + 2£Re[q(/)%] ) 



l^(«)| S 



7 



+ 2<He 



Zrad to i/ie following expression of the criterion function M n (a) 



M n (a)= £ |^| 2 h(/)| 2 " £ 



J-lAl 



E^ E i«i a (s« a +f 



J2 z — ✓ n 

i=i i 



(22) 
(23) 



J?EE^ E l*| 2 [cos(J(«i-«*))(^+^) 

3 = 1 k>j l=-Z~l 

+ <M««i-«0)($&-$£w)l (24) 
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+ 



2(J-1)^ 1 



-|EE4 E n 2 iq(/)|[cosG( 



cos(^)^+sin(^)e 



aj-a*)+ft)Si 



(25) 



+ sm(l(aj -a k ) + 6»/)£^] . 

We have split the criterion function into four different terms: (|22|) . ()23|) . (|24ft . 
and ()25|) . FFe aim at proving the convergence of these terms to a determinist 
contrast function and the uniform convergence of their increments. 

• The term (|22|) is a deterministic one. Using Parseval theorem, we have 
that 



n — >+oo 



T |(/^)WI 2 f 



3 = 1 



dt 



• The term (|23[) is a pure noise term composed of terms of the type 

~ E 7J 2 _n-i l^;| 2 £f,; • S'iwce / € 2« are independent, by the SLLN we 
i— — j 

get 



- E \5 2 l \t,- l 2 - p ~-m<+™ 



for a constant A(<5) which only depends on the choice of the smoothing 
sequence and not on the unknown parameter a. This constant is bounded 
since the weights are bounded. Note that if /1 has a density lying in L 2 , 
this constant vanishes since l/nj^iez, l<^| 2 — > 0. 
• The first term in (|24[) is also a pure noise term composed of terms of the 
type 

71-1 
1 2 

U n {a -a k ) = - Yl I 5 '' 2 co < l ( a j ~ <Xk))gjiHv k ^ J- 
One has E(U n ) = and E(exp(t E.TTn-i £ 2 |G^,|)) < +00. Thus for 

I— 2 

a, 6 > a Bernstein type inequality holds giving 



sup 

, Qj —as. £ [-27r,27r]nZ/n 2 



|t/„(aj - a fc )| > x] <0[n exp(-- 



1 n x 



2^2 



2 a + bnx 



Choosing x n — y/8blog(n)/n, we get both 

x n — > and P a .( sup \U n (aij - a k )\ > x n ) — ► 0. 

-a k E l-2ir,2n]nZ/n 2 
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For \h\ < 1/n 2 the inequality | coslh — 1| < 1/n /earfs to 

1 2 

|*7n(aj-ak + /O-0i»(ay-a*)|< -3 E 



' — ' ''' 

i=-K 



so i/iai 

p * 

sup |^n(«j - a*) I — 2 — ► 0. 

a 3 -a fc e[-27r,27T] n->+oo 



Hence the first term in (|24[) goes to m probability. 
• TTie remaining term in (|24[) is similar to the term (|25p . which has the 
same asymptotic behavior as 

ra — 1 

7„(a J --a fc ) = 4= E l^| 2 h(/)|cosa( ai -a*-a fe )+^. 
V n „_i 

,— 2 

JJere we on/?/ give i/ie proof of the uniform convergence of (|25[) which 
holds under slight modifications of the second term of (|24p . ^4s i/ie noise 
is Gaussian we get 

l{V n {aj-a k )) =0 



rX 



VarK(«,-a fc ) <^ E I^ 4 |q(/)| 2 < ^ E l*l 4 l<*(/)| S 

l=-2=i Z=-oo 



so i/iai 



P a « I sup |K(otj - afc)| > 

Qj- QfeGl — 27T.27T] 



161ogn£t-oJo-/l 2 h(/)l 2 



< Kn 2 exp(~ 41ogra). 
Using again previous bound for h < 1/n 2 , we obtain 



\V n { a3 -a k + h)-V n { a3 -a k )\<-^= V Md(f)\t 

l 

+00 \ 2 / 1 -\-00 







n \ * — ' / \ n 

\l — — oo / \ /— — oo 



fcZ 
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so that 



sup \V n (aj - a k )\ 

OLj — Qfc £ [ — 2-7T,27r] 



in conclusion, we have that sup ae r_ 7r ^ij |Af n (a) — if (a) — | 



n — >+oo 
2 



wt/i 



3=1 



TVws ensures that is fulfilled and that the convergence property in (|10[) 
is ensured. It remains to be seen that the asymptotic contrast enables to 
identify the aj 's, which concludes the proof of Condition (|10[) . Cauchy- 
Schwartz inequality yields that 



f 

Jo 



1 i 

7 £(/*/*)(*+ 



j 

3=1 



T 7 



hence, K(-) > 0, and i/ie minimum value is reached for 



[ \{f ^)(t)\ 2 Y 







1 J 



di 



which is equivalent, using Parseval theorem to 

J 



J2\s ici (f)\ 2 = J2 

So, we have that 



i^S lCl (f)e a ^~ a ^ 



3=1 



© vz e {Z : cjtfj ^ 0}, 



1 J 

J ^ 



3 = 1 



= 1. 



#0x0, /rom (jHJ) t/iis implies that 

Vj = 1, . . . , J, aj = a* + c [27r] , c e 
/n a matrix way, we get the equation (J5j) 7 i. e 

1 



/ ai 

\ OLJ 



+ c 



1 




(26) 



c e 
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Hence, since a G A and A is defined by ([6]), we have shown that ctj = a* 
for all j = 1, . . . , J. Since a t— > K{a) achieves its unique minimum for 
a = a* , the condition (|10[) is fulfilled. 

Proof 6.2 (Proof of Proposition 15. 1|) The first and the second derivatives 
of the empirical contrast, for all k G {2, . . . , J}, for all m G {2, . . . , J} can be 
written as: 

re-l 

^r(«) = 7 E l^| 2 Om(5 H («)^R), (27) 
-dcT^ = j2 E |5i| 2 ^ 4,(a)^c j( ( a ) , (28) 

k 1=-^ V ^ / 

Vm ^ fc ' 0a fc 0a (a) = '^ E l^| 2 ^e(g H (")^("))- (29) 
straightforward calculations, we get that 

71-1 

^dM n , is 2 



K) = 7 E lWN(/)lOf -^)+wf) 



<9a fc J 

i= 



where, for all I G Z, 

./ 

+ cos (/K- a ;))(^^-^^; 



Vf = (cos(K + - sin(Za£ + , and y, = - ]T V?'. 

Lei, /or ! e Z, 7( = (£^ £fi ' ' ' £ji £iz £fr ' ' £jz)' ' arl ^> ^ /f ^ e ^ e vec t° r °f 
length 2J, defined by (fi) k — cos {la* k + 6i) , (fi) J+k = — sin (la* k + 9i) , and 
(/j ) • = ^ {^i </+fc}- a consequence, we get the following expression for 
Vf: V t k = (ff,Yi) = ff'Y h In a same way, for I G Z, let B\ be the (2J) x (2J) 
matrix defined by rows by 

(sin[Z(c4 - aj)] • • • sin[Z(a£ - a})] - cos[l(a* k - aj)] • • ■ 
-cos[Z(a^-a})]), 

(cos[Z(c4 - at)] • • • cos[Z(a£ - a})] sin[i(a£ - a*)] ■ ■ • sin^a^ - a})]) , 
(0---0) {k, J + k}. 
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Further, let the symmetric matrix Bf be defined by = — — \ ■ Hence, 
write W t k = j^Yi'BfYi. Define also for k = 2, . . . , J Bf = §Bf , fj° = §//\ 

Our aim is to study the asymptotic distribution of the gradient y/nVM n (a*). 
For this purpose consider u = (112, ■ ■ ■ ,uj) G M' /_1 andt = {t 2 , • • ■ , tj) € R J_1 , 
and define the couple of random variables: 



k=2 k=2 1— "- 1 



Using previous notations, we get 
R n = £ |<J,| 2 Z|c z (/)| (3/(n),Yi) , with 9l (u) = £u fc iff - ) , 

1 ^ 

s » = -rr Y, {SifiYiMW, with Mt) = V) 



1= 



First note that E(S , „) = 0. Moreover Assumption (|16[) implies that 



n , n 

lei 



p . 

Hence, S n — — > 0. So, the quadratic part is vanishing in probability when n 

n — >+oo 

increases. 

For the last term, we have that (gi(u), Yj) ~ J\f (0, ||<7z(u)|| 2 ) , where \\gi(u)\\ 2 = 

■jxv! (Ij—i — jUj-i) u, with, Ij—i the J — 1 identity matrix, and Uj—i the 
J — 1 x J — 1 matrix which all entries are equal to 1. The independence of the 
Yi 's yields that under Assumption (I15[) 

Rn AT ( 0, -i Y, l<^ 2 h(/)|V (ij-i - V.7-1) «) ■ 

Finally we get that 

^M n {a*)^^Nj-i ^-igl^l^lQl/)! 2 ^Jj.x - jUj-^j . 

Proof 6.3 (Proof of Proposition 15.21) First, we pay attention to the non di- 
agonal terms of the matrix of the second derivatives. For m ^ k, we get after 
some calculations: 
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J 2 i)-M„ 



Tl-1 

2 

= E \Si\ 2 l 2 \ci(f)\ 2 cos (l[a k -al + a* m -a m ]) (30) 



/=- 



+ £ |^| 2 Z 2 |q(/)|(cos[/(a fc -^-a m )+^]< i 

+ sin[Z(a/b - afe - a m ) + 6>;]u;^) (31) 

+ E l^l^ 2 |cK/)l(cos[/(a TO -a^-a fc )+« 
i=-V 

+ sin[Z(a m - a4 - a fe ) + 9i]w y kl ) (32) 

n — 1 
2 

- sia(i[a fc - a m ]){w y kl w x ml - . (33) 

We now study the asymptotic behaviour of each term separately. Indeed, the 
second derivatives are taken at a point a n which converges to a* : a n is in the 
neighborhood of a* with radius \\a* — a n \\. Hence, we need conditions to claim 
uniform convergence ofS7 2 M n (-). 

First note that the deterministic term Yli=-(J-i)/2 \3i\ 2 l 2 \ c i(f)\ 2 cos (l[&k — 

a* k +a* n —a m ]) converges towards J2i l^i| 2 ' 2 | c i(/)| 2 as soon asj^i l^| 2 ^ 4 | c K/)| 2 < 
+oo, as assumed in (fT2"|) . 

Now consider the random terms. Since for all k e {1, . . . , J}, the random 
variables w kl and w kl follow a Gaussian law N (0,1/ n), we consider the inde- 
pendent variables £jy and £, k[ such that w kl — A^ k i and — ^C^- For the 
two second terms (|31| and ([32|), we write 



(ED = -4 E l^| 2 * 2 l c *(/)l (™s{l(a k - a* k - a m ) + 



i[l(a k - a% - a m ) + Qr)^ 



m = 4 E l^l 2 ' 2 l c K/)l (cos[/(a ro - a* m - a k ) + Ofo 



ml) ' 

t tiA c 

kl 



■ sm[l(a m - a* m - a k ) + > 



whose asymptotic behaviours are of the same nature. As in the proof of Proposi- 
tion \5.1\ Condition (119p leads to sup _ s (T5TI) 2 — ► 0, and sup ,i (|3"2"1) — — > 

fc 1 n—*-\-ca fc 1 n—*+oo 

0. Further, Assumption (|20[) implies that (|33[) converges in probability uniformly 
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to 0. The diagonal terms can be written as follows: 

1 " V S*)= E Le~ 



2 da 2 



= £ |^| 2 Z 2 | Ci (/)| 2 ^cos (Z[o; & - ^ + - a^]) (34) 

n — 1 
2 

+ E i^i 2 « 2 ici(/)iE( co ^K- a fe-^)+^K 

+ sin[Z(a fe - a£ - aij) + 0i]wfy (35) 

+ e i^i 2 ^Iq(/)iE( cos [^-^-^)+« 

+ sin [Z(aj - a* - a fe ) + 6»/] w^ ; ) (36) 

+ E i^i 2 « 2 EH / [^- a i])( w «^+<^) 

- sm(l[a k - ajj^wliw^ - w^w^) . (37) 

Using similar arguments as for the previous terms, we can see that, under the 
same assumptions we get that all the terms (j3"4")) . (|33|) , (13"6")) and ([57)) converges 
uniformly, and we get 

d 2 M n P a , 2(J-1)^ 22 2 

j4s a result, gathering the two previous results leads to the following asymp- 
totic behavior: 

V 2 M n (a„) -^-> A £ |^| 2 Z 2 |q(/)| 2 (J//-1 - C/.7-1) , 

which proves the result. Moreover, this matrix is invertible. As a result, we have 
that 

[V *M„ M - - 5 _^_ 1 (,,_, + . 
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